Sequencing and Raw Sequence Data Quality Control    ◾    5

for PCR DNA strand synthesis and barcode sequence for indexing the sample DNA. This

allows multiple samples (multiplexing) to be sequenced in a single run; the DNA fragments

of each sample will have a unique barcode. Later, after sequencing, the sample sequences

can be separated in the analysis by demultiplexing. In the sequencing of some application

like gene expression (RNA-Seq) and epigenetics, an enrichment step is usually included

to amplify or to separate only the targeted sequences. In RNA application, enrichment is

performed to separate mRNA from the other types of RNA. In epigenetics, the genomic

regions where protein interaction is taking place can also be enriched. The enrichment is

usually performed with PCR, but there are other means as well. The library preparation of

the DNA/RNA is similar for all NGS technologies but the sequencing process is different

from one to another. The sequences produced by the NGS technologies range between 75

and 400 base pairs (bp) in length. These sequences are called short reads. In general, short-

read sequencing (SRS) can either be a single-end sequencing, which sequences the forward

strand only, or paired-end sequencing, which sequences both forward and reverse strands.

The latter reduces the chance of making basal error in the resulted sequence. The DNA or

RNA reads consist of the four nucleobase characters A, C, G, and T. However, the sequence

may also include N for an unresolved base.

1.2.2.1  Roche 454 Technology

Roche 454 pioneered the NGS, when 454 was used to sequence the whole genome of

Mycoplasma genitalium in 2006 [3]. The 454 technology uses pyrosequencing, which

depends on the sequential addition and incorporation of nucleotides in the DNA template.

The signal of the added nucleotide is quantitated by conversion of released pyrophosphate

into a light signal in the real time. The pyrosequencing is based on a series of enzymatic

reactions that lead to the DNA synthesis and release of the inorganic pyrophosphate every

time a nucleotide is incorporated by polymerase in the DNA chain. The density of the light

generated by the reactions can be detected by a charge-coupled device camera. The order

of the nucleotides in the DNA template is determined by quantitating the light density.

In the pyrosequencing, the DNA is fragmented and denatured into ssDNA. Two adap-

tors (A and B) are ligated to both ends of the fragments. Beads of soluble particles with

single-stranded primers complementing adaptor A are added to the reactions. The adaptor

A attached to ssDNA template complements the bead primers, which initiate the synthe-

sis of the complementary strand. This step can be repeated several times for enrichment

(PCR). Then, the beads with the ssDNA templates are placed into wells where sequencing

takes place. A primer is added to complement the adaptor B and to initiate the addition of

new nucleotides to the complementary strand. However, this time, known nucleotides are

added. Every time a nucleotide is incorporated into the complementary strand, a hydroxyl

group of the last nucleotide reacts with the alpha phosphate of the incorporated nucleo-

tide releasing a two-phosphate compound called the inorganic pyrophosphate (PPI). The

PPI contains a high amount of energy that converts the adenosine monophosphate (AMP)

into adenosine triphosphate (ATP) with the help of ammonium persulfate (APS) and sul-

furylase which are added to the reaction. Finally, luciferin and luciferase are added to

the ATP so the luciferin forms light. Every time a nucleotide is added, a light is emitted